Dictionary Alignment by Rewrite-based Entry Translation
نویسندگان
چکیده
In this document we describe the process of aligning two standard monolingual dictionaries: a Portuguese language dictionary and a Galician synonym dictionary. The main goal of the project is to provide an online dictionary that can show, in parallel, definitions and synonyms in Portuguese and Galician for a specific word, written in Portuguese or Galician. These two languages are very close to each other, and that is the main reason we expect this idea to be viable. The main drawback is the lack of a good and free translation dictionary between these two languages, namely, a dictionary that can cover lexicons with more than one hundred thousand different words. To solve this issue we defined a translation function, based on substitutions, that is able to achieve an F1 score of 0.88 on a manually verified dictionary of nine thousand words. Using this same translation function to align a Portuguese–Galician dictionary we obtained almost 50% of the dictionary lexicon (more than eighty thousand words) alignment. 1998 ACM Subject Classification I.2.7 Natural Language Processing
منابع مشابه
EFL Translation Students' Perspective toward Using Bilingual Dictionary in Translation of Polysemous Words
This research presented the use of bilingual dictionary and addressed the EFL translation students' points of view on the use of bilingual dictionary in translating polysemous words (English to Persian). Moreo- ver, it aimed at finding the possible relationship between the effect of using bilingual dictionary by stu- dents in translating polysemous words and their achieved scores. In the study ...
متن کاملThe Role of Parallel Corpora in Bilingual Lexicography
This paper describes an approach based on word alignment on parallel corpora, which aims at facilitating the lexicographic work of dictionary building. Although this method has been widely used in the MT community for at least 16 years, as far as we know, it has not been applied to facilitate the creation of bilingual dictionaries for human use. The proposed corpus-driven technique, in particul...
متن کاملImproving Statistical Word Alignment with Various Clues
This paper proposes a method to improve word alignment by combining various clues. Our method first trains a baseline statistical IBM word alignment model. Then we improve it with various clues, which are mainly based on features such as lemmatization, translation dictionary, named entities, and chunks. We incorporate these features into an unified framework. Experimental results show that our ...
متن کاملImproving pronunciation dictionary coverage of names by modelling spelling variation
This paper describes an attempt to improve the coverage of an existing name pronunciation dictionary by modelling variation in spelling. This is done by the derivation of string rewrite rules which operate on out-of-vocabulary words to map them to in-vocabulary words. These string rewrite rules are derived automatically, and are “pronunciation-neutral” in the sense that the mappings they perfor...
متن کاملSentence Alignment of Historical Classics based on Mode Prediction and Term Translation Pairs
Parallel corpora are essential resources for the construction of bilingual term dictionary of historical classics. To obtain large-scale parallel corpora, this paper proposes a sentence alignment method based on mode prediction and term translation pairs. On one hand, the method rebuilds the sentence alignment process according to characteristics of the translation of historical classics, and a...
متن کامل